Oh this is it.
Perfect.
I think these extra layers are going to make it so much better.
Oh yeah, increasing the size of this layer was a really good idea.
Alright.
Okay.
I can’t wait any longer.
It’s time to test it.
John-Green-Bot: Ja-bril. Jabril.
I wrote a nov-el.
Jabril: Woah John-Green-Bot.. you did what?
John Green Bot: I wrote a novel.
A novel?
Let me see this.
Wow..
John Green Bot this is pretty sloppy we need to work on your handwriting.
Hold up.
Hold up.
You wrote one letter per page!
This is impossible to read.
John Green Bot we’ve got to get your novel an audience so let’s digitize this using
machine learning!
But first, there’s something else we have to test.
[INTRO - LAB]
Welcome back to Crash Course AI!
I’m your host Jabril, and today we’ll be doing something a little different.
This is the first time we’re trying a hands-on lab on Crash Course, so we’ll tackle a project
together and program a neural network to recognize handwritten letters.
Alright John Green Bot we’ll get back to you when we’ve got something.
We’ll be writing all of our code using a language called Python in a tool called Google
Colaboratory.
You can see the code we’re about to go over in your browser from the link we put in the
description, and you can follow along with me in this video.
In these Colaboratory files, there’s some regular text explaining what we’re trying
to do, and pieces of code that we can run by pushing the play button.
These pieces of code build on each other, so keep in mind that we have to run them in
order from top to bottom, otherwise we might get an error.
To actually run the code and experiment with changing it you have to either click “open
in playground” at the top of the page OR open the File Menu and click “Save a Copy
to Drive.”
And just fyi, you’ll need a Google account for this.
Remember, our goal is to program a neural network to recognize handwritten letters and
convert them to typed text.
Even though this stack of paper is unreadable to me, we can work with it, and it could actually
make our project a little easier.
Usually with a project like this, we’d have to write code to figure out where one letter
ends and another begins, because handwriting can be messy and uneven.
That’s called the segmentation problem.
But because John-Green-bot wrote his novel like this, the letters are already segmented,
and we can just focus on recognizing the letter on each page.
By the way, avoiding the segmentation problem is also why official forms sometimes have
little boxes for each letter, instead of just a line for writing your name.
Even though we don’t have to worry about segmentation, recognizing handwritten letters
and converting them to typed text is still tricky!
Every handwritten “J” looks a little different, so we need to program our neural network to
recognize a pattern instead of memorizing a specific shape.
But before we do this, let’s think about what we need to get there.
Neural networks need a lot of labeled data to learn what each letter generally looks
like.
So Step 1 is: find or create a labeled dataset to train our neural network.
And this involves splitting our dataset into the training set and the testing set.
The training set is used to train the neural network.
And the testing set is data that’s kept hidden from the neural network during training,
so it can be used to check the network’s accuracy.
Next, Step 2 is create a neural network.
We’ll actually need to configure an AI with an input layer, some number of hidden layers,
and the ability to output a number corresponding to its letter prediction.
In Step 3, we’ll train, test, and tweak our code until we feel that it’s accurate
enough.
And finally in Step 4, we’ll scan John-Green-bot’s handwritten pages and use our newly trained
neural network to convert them into typed text!
Alright, let’s get started.
Step 1.
Creating a labeled dataset can be a huge and expensive challenge, especially if I have
to handwrite and label thousands of images of letters by myself.
Luckily, there’s already a dataset that we can use: the Extended Modified National
Institute of Standards and Technology dataset, or EMNIST for short.
This dataset has tens of thousands of labeled images of handwritten letters and numbers,
generated from US Census forms.
Some of the handwriting is relatively neat and some... not so much.
We’re going to use the EMNIST letters chunk of the dataset, which has 145,600 images of
letters, because we’re only recognizing letters in John-Green-bot’s book, not numbers.
This code here will give our program access to this dataset, also called importing it.
So now we need to make sure to keep our training and our testing datasets separate, so that
when we test for accuracy, our AI has never seen the testing images before.
So now in our code at step 1.2 let’s call the first 60,000 labeled images “train”
and the next 10,000 labeled images “test.”
These images of letters are 28x28 pixels, and each pixel is a grayscale value between
0 and 255.
To normalize each pixel value and make them easier for the neural network to process,
we’ll divide each value by 255.
That will give us a number between 0 and 1 for each pixel in each image.
Performing a transformation like this to make the data easier to process is a machine learning
method is called preprocessing.
By the way, we’ll need different preprocessing steps for different types of data.
Alright, it may take a few seconds to download and process all of these images, so while
that’s happening, I want to clarify that EMNIST is a luxury.
There aren’t many already-existing datasets where you have this much labeled data to use.
In general, if we try to solve other problems, we’ll have to think hard about how to collect
and label data for training and testing our networks.
Data collection is a very important step to training a good neural network!
In this case though, we’ve got plenty to use in both sets.
Okay, let’s write a little piece of code to make sure that we imported our dataset
correctly.
This line lets us display an image and we’ll also display the label using the print command.
See, this letter is labeled as a Y.
We can display a different example by changing this index number, which tells our program
which letter image in the EMNIST dataset to pull.
Let’s look at the image indexed at 1200… this is labeled as a W.
These are already labeled images.
There’s no neural network making any decisions yet, but this /is/ a labeled dataset, so we’re
done with the first step!
Step 2
Now that we have our dataset, we need to actually build a neural network, but we don’t need
to reinvent the wheel here!
We’re going to stick with a multi-layer perceptron neural network or MLP, which is
the kind we’ve focused on in the Neural Networks and Deep Learning Episodes.
There are already some tools in Python called libraries that we can use to help make the
network.
We’re going to use a library called SKLearn (which is short for Sci Kit Learn), we’ll
import that so that we have access to it.
SKLearn includes a bunch of different machine learning algorithms, and we’ll be using
it’s Multi-Layer Perceptron algorithm in this lab.
So, our neural network is going to have images of handwritten letters as inputs.
Each image from EMNIST is 28 pixels by 28 pixels, and each of these pixels will be represented
by a single input neuron, so we’ll have 784 input neurons in total.
Depending on how dark a particular pixel is, it will have a greyscale value between 0 and
1, thanks to the processing we did earlier.
The size of our output layer depends on the number of label types that we want our neural
network to guess.
Since we’re trying to guess letters and there are 26 letters in the English alphabet,
we’ll have 26 output neurons.
We don’t actually have to tell the network this though --- it will figure this out on
its own from the labels in the training set.
For the structure of the hidden layers, we’ll just start experimenting to see what works.
We can always change it later.
So we’ll try a single hidden layer containing 50 neurons.
Over the span of one epoch of training this neural network, each of the 60,000 images
in the training dataset will be processed by the input neurons, the hidden layer neurons
will randomly pick some aspects of each image to focus on, and the output neurons will hold
the best guess as to whether each image is a particular letter.
You’ll see that the code in our Colab notebook calls this an “iteration.”
In the specific algorithm we’re using, an iteration and an epoch are the same thing.
After each of the 60,000 images are processed, the network will compare its guess to the
actual label and update weights and biases to give a better guess for the next image.
And over multiple epochs of the same training dataset, the neural network’s predictions
should keep getting better thanks to those updated weights and biases.
We’ll just go with 20 epochs for now.
We’ve captured all that in a single line of code iin step 2.1, which creates a neural
network with a single hidden layer with 50 neurons that will be trained over 20 epochs.
This is why libraries can be so useful, we’re accessing decades of research with one line
of code!
But, keep in mind, there are cons to using libraries like this as well.
We don’t have a lot of control over what’s happening under the hood here.
When solving most problems, we’ll want to do a mix of using existing libraries and writing
our own AI algorithms, so we would need a lot more than just one line of code.
For this lab, though, step 2 is done.
Step 3.
Next, we want to actually train our network over those 20 epochs and see how well it guesses
the letters in the training and testing datasets, with this line of code in step 3.1!
For every epoch, our program prints a number called the error of the loss function.
This basically represents how wrong the network was overall.
We want to see this number going down with each epoch.
The number that we /really/ care about is how well the network does on the testing dataset,
which shows how good our network is at dealing with data that it’s never seen before.
And we have 84% correct!!
Now that’s not bad considering we only trained for 20 epochs, but we still want to improve
it.
To see where the network made the most mistakes, we can create a confusion matrix which we
made in step 3.2.
The color of each cell in the confusion matrix represents the number of elements in that
cell, and a brighter color means more elements.
The rows are the correct value and the columns are the predicted value, and the numbers on
the axes represent the 26 letters in the alphabet.
So, zero is “A”, one is “B”, etc.
So cell (0,0) represents the number of times that our network correctly predicted that
an "A" is an "A." It’s good to see a bright diagonal line, because those are all correct
values!
But other bright cells are mislabeled, so we should check if there are any patterns.
For example, "I" and "L" may be easy to confuse, so let's look at some of the cases where that
happened.
We can also try other types of errors, like every time our network guesses that a “U”
is a “V.” 37 times.
To see if we can improve our accuracy, we can program a slightly different neural network.
More epochs, more hidden layers, and more neurons in the hidden layers could all help,
but the tradeoff is that things will be a bit slower.
We can play around with the structure here to see what happens.
For now, let’s try creating a neural network that has 5 hidden layers of 100 neurons each,
and we’ll train it over 50 epochs.
It’ll take a few minutes to run.
Now we’ve got better accuracy rates on our testing dataset -- we have 88% correct instead
of 84% correct and that’s an improvement!
Over time, we can develop an intuition about how to structure neural networks to achieve
better results.
See if you can create a network that has a higher accuracy than ours on the testing dataset.
But, for now, we’re gonna move forward with this trained network.
Step 4.
The final step is our moment of truth.
We’re going to use our trained neural network to try and read John-Green-bot’s novel,
so let’s dig into this stack of papers.
First, we’ve got to get our data in the right format by scanning all these papers.
And done. And because we’re using Google Colab, we need to get them online.
We’re storing them in a GitHub repository which we coded to import into our Colaboratory
notebook.
But, as you can see, those scanned images are HUGE.
So we've also done a bit of preprocessing on these scans to avoid having to download
and compute over so much data.
We've changed the size of every image to 128x128 pixels.
The other thing you may notice is that the EMNIST dataset uses a dark background with
light strokes, but our original scans have a white background with dark strokes.
So, we also went ahead and inverted the colors to be consistent with EMNIST.
Alright.
Now, back to the Collab Notebook.
So this code right here in Step 4.1 will pull the modified letters from GitHub.
Now, we’ll read them into an array and display one of them, just to make sure we’re able
to import correctly.
This looks pretty good!
Clearer than the EMNIST data actually.
But back to the point why we’re doing this in the first place - let’s see if we can
process John Green Bot’s story now.
Ummm…
This is not making any sense…
So we’re doing something wrong.
First off, John-Green-Bot's story had some empty spaces between words.
We never actually trained our model on empty spaces, just the 26 letters, so it wouldn't
be able to detect these.
But blank pages should be easy to detect.
After all, unlike handwritten letters, all blank images should be exactly the same.
So, we'll just check each image to see if it’s a blank space, and if it is, we'll
add a space to our story.
This looks better.
There are separate words and I can tell that the first word is "The," but not much beyond
that.
So something else isn’t going right here.
Well, even though the letters on the papers we scanned look clear to my human eyes, the
images were really big compared to the handwritten samples that were used to train EMNIST.
We resized them, but that doesn't seem to be enough.
To help our neural network digitize these letters, we should try processing these images
in the same way that EMNIST did.
Let’s do a little detective work to figure out how the EMNIST dataset was processed,
so our images are more similar to the training dataset, and our program’s accuracy will
hopefully get better.
Hmmm….
“Further information on the dataset contents and conversion process can be found in the
paper.”
We’re not going to go through the paper but we’re link it in the description if
you want to learn more.
Basically, I made the following additions to the code.
We’re applying some filters to the image to soften the letter edges, centering each
letter in the square image, and resizing each one to be 28x28 pixels.
As part of this code, we’re also displaying one letter from these extra-processed images,
to do another check.
Even though to MY eyes, the letter looks less clear now, it does look much more similar
to the letters in the EMNIST dataset, which is good for our neural network.
The edges of the letters are kind of fuzzy and it’s centered in the square.
So, let’s try processing this story ONE more time.
Keep in mind that with an 88% accurate model, we expect to get about 1 in 15 letters wrong
in the story.John-Green-bot are you ready?
Alright let’s see what you were talking about.
“The Fault in Our Power Supplies”
“I fell in love the way your battery dies, slowly and then all at once”
Quite poetic John Green Bot.
Okay, it’s not perfect, but it was pretty easy to figure out with context and by knowing
which letters might be mistaken for each other.
Regardless, thanks John Green Bot for giving us a little taste of your first novel.
And thank you for following along in our first Crash Course Lab.
Let us know in the comments how you think you could improve the code and tell us if
you use it in any of your own projects.
Now, this kind of supervised machine learning is a big component of the AI Revolution, but
it’s not the only one!
In later videos, we’ll be looking at other types of machine learning, including unsupervised
and reinforcement learning, to see what we can do even without a giant labeled dataset.
See you then.
Crash Course Ai is produced in association with PBS Digital Studios.
If you want to help keep Crash Course free for everyone, forever, you can join our community
on Patreon.
And if you want to learn more about the basics of programming in any language, check out
this Crash Course Computer Science video.